Today we are going to
1) be investigating the results from the alignment and the differential expression analysis that you went through on the previous occassion - do these make sense? do we trust them?
2) learn how to use RMarkdown to more easily generate reproducible results
3) learn how to use ggplot2 to make, and annotate, publication ready plots (PDF)
- PCA (incl gene loadings)
- Volcano plot
4) Use IGV and Gviz to visualise coverage of genes
The idea is that you will be given most of the R code, but unlike the previous day you will be forced to “copy and paste”, as well as think about what the code is doing.
During the course of the day we will go through the exercises in this document, and there will be three types of “exercises”:(taken from previous day!)
We will continue looking at the data from the previous day, eg YAP and TAZ control peripheral myelination and the expression of laminin receptors in Schwann cells. Poitelon et al. Nature Neurosci. 2016. (http://www.nature.com/neuro/journal/v19/n7/abs/nn.4316.html) In the experiments performed in this study, YAP and TAZ were knocked-down in Schwann cells to study myelination, using the sciatic nerve in mice as a model. The material for RNA-seq was collected from 2 conditions (wt and YAP(kd)TAZ(kd)), each in 3 biological replicates:
| Accession | Condition | Replicate |
|---|---|---|
| SRR3222409 | KO | 1 |
| SRR3222410 | KO | 2 |
| SRR3222411 | KO | 3 |
| SRR3222412 | WT | 1 |
| SRR3222413 | WT | 2 |
| SRR3222414 | WT | 3 |
Use RMarkdown for reproducible results as a way to connect your data with your “results” (eg plots), and discussion points.
RStudio + RMarkdown
1) https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
2) http://rmarkdown.rstudio.com/authoring_basics.html
3) http://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf
R
1) Quick-R (for example http://www.statmethods.net/stats/index.html and http://www.statmethods.net/management/sorting.html)
2) http://www.cyclismo.org/tutorial/R/
When you have a small simple file like this re-knitting every time is useful, but as you progress to more and more complex, and longer, R Markdown files, you could run your commands directly in RStudio and only re-knit once you want to see how your HTML page looks like. So go to the plot command of your RMarkdown file and press
Command and Enter
you will now see the plot in the bottom right corner of your RStudio
The first you should do is to ‘setup’ your environment, and this includes to load all required packages, and determine how the output should look like.
In general I find it easier to view “dependencies” by loading all R packages at the top, no matter how far down your document you will use them… Also I tend to add a comment (# to the right) of why I need a specific package, and also the link to the web-page containing the examples on how to use the package. The first time you load a package, it might not exists on your computer, so the below is a good code for testing if the package exists, and if not install it.
Note there are, at least, three major sources, or R packages
* cran
* bioconductor
* github
and they have slighly different installation instructions (for example install.packages vs biocLite) and load (library vs require)
if (!require("RColorBrewer")) { # test if the package can be found
install.packages("RColorBrewer", dependencies = TRUE); # install the package, if needed
}
library(RColorBrewer); # load the package
The full list of packages:
library(dplyr); # summarise your data
library(ggplot2); # ggplot
library(knitr); # tables with more options on what they will look like...
library(limma); # voom
library(VennDiagram); # https://rstudio-pubs-static.s3.amazonaws.com/13301_6641d73cfac741a59c0a851feb99e98b.html
library(grid); # required by make3wayVenn
source("https://bioconductor.org/biocLite.R"); # bioconductor is a "collection" of many bioinformatics packages
library(Gviz);
library(GenomicRanges); # used by Gviz
library(BSgenome.Hsapiens.UCSC.hg19); # used by Gviz
Define where RStudio should be looking for additional files
setwd("/Users/halena/Documents/WABI/Teaching/2016_GU_Seqcourse");
(they can also be found on ruby on /home/teacher5/rna-seq-course)
List all the files in the current working directory, there are two commands that means the same thing list.files() and dir.
list.files() # list ALL files
dir(pattern="*bam") # list all BAM files
You will see again and again that there are multiple ways of achieving the same results in R, you have to figure out which one makes the most sense to you.
Its important to quality control (QC) your results, and this is very easily done in R by “viewing” your data.
tableCounts <- read.table(file="tableCounts", sep="\t", header=TRUE, skip=1);
Look at the values in the table
head(tableCounts);
## Geneid Chr Start End Strand Length
## 1 ENSMUSG00000102693 1 3073253 3074322 + 1070
## 2 ENSMUSG00000064842 1 3102016 3102125 + 110
## 3 ENSMUSG00000051951 1 3205901 3671498 - 465598
## 4 ENSMUSG00000102851 1 3252757 3253236 + 480
## 5 ENSMUSG00000103377 1 3365731 3368549 - 2819
## 6 ENSMUSG00000104017 1 3375556 3377788 - 2233
## SRR3222409_Aligned.out.sam SRR3222410_Aligned.out.sam
## 1 0 0
## 2 0 0
## 3 41 31
## 4 0 0
## 5 0 0
## 6 0 0
## SRR3222411_Aligned.out.sam SRR3222412_Aligned.out.sam
## 1 0 0
## 2 0 0
## 3 46 90
## 4 0 0
## 5 0 0
## 6 0 0
## SRR3222413_Aligned.out.sam SRR3222414_Aligned.out.sam
## 1 0 0
## 2 0 0
## 3 83 83
## 4 0 0
## 5 0 0
## 6 0 0
If you are unsure about how a specific command is written you can always call help in either R or RStudio
help(read.table)
In order to look at the data in a better way inside RStudio, although it wont work inside your RMarkdown document
View(tableCounts)
Different ways of using max to get the answer you need.
max(tableCounts[,c("SRR3222409_Aligned.out.sam", "SRR3222410_Aligned.out.sam", "SRR3222411_Aligned.out.sam", "SRR3222412_Aligned.out.sam", "SRR3222413_Aligned.out.sam", "SRR3222414_Aligned.out.sam")]) # will work but is awkward
## [1] 313845
max(tableCounts[,7:12]) # will work but if you add a column it will break!
## [1] 313845
# for each gene get the max count
maxCountPerGene <- summarise(group_by(tableCounts, Geneid), maxCount=max(SRR3222409_Aligned.out.sam, SRR3222410_Aligned.out.sam, SRR3222411_Aligned.out.sam, SRR3222412_Aligned.out.sam, SRR3222413_Aligned.out.sam, SRR3222414_Aligned.out.sam));
max(maxCountPerGene$maxCount); # get the max count from this directly...
## [1] 313845
# use head and order your data
head(maxCountPerGene[order(maxCountPerGene$maxCount, decreasing=TRUE),], n=1);
## # A tibble: 1 <U+00D7> 2
## Geneid maxCount
## <fctr> <int>
## 1 ENSMUSG00000056569 313845
check that it did what you expected it to
head(maxCountPerGene);
## # A tibble: 6 <U+00D7> 2
## Geneid maxCount
## <fctr> <int>
## 1 ENSMUSG00000000001 1711
## 2 ENSMUSG00000000003 0
## 3 ENSMUSG00000000028 123
## 4 ENSMUSG00000000031 19508
## 5 ENSMUSG00000000037 45
## 6 ENSMUSG00000000049 5
how is the gene expressed across conditions?
subset(tableCounts, Geneid %in% "ENSMUSG00000000001"); # test that the max count is indeed right
## Geneid Chr Start End Strand Length
## 11835 ENSMUSG00000000001 3 108107280 108146146 - 38867
## SRR3222409_Aligned.out.sam SRR3222410_Aligned.out.sam
## 11835 922 759
## SRR3222411_Aligned.out.sam SRR3222412_Aligned.out.sam
## 11835 1037 1711
## SRR3222413_Aligned.out.sam SRR3222414_Aligned.out.sam
## 11835 1557 1329
subset(tableCounts, Geneid %in% "ENSMUSG00000056569");
## Geneid Chr Start End Strand Length
## 2875 ENSMUSG00000056569 1 171150711 171161130 + 10420
## SRR3222409_Aligned.out.sam SRR3222410_Aligned.out.sam
## 2875 44051 58764
## SRR3222411_Aligned.out.sam SRR3222412_Aligned.out.sam
## 2875 57719 249437
## SRR3222413_Aligned.out.sam SRR3222414_Aligned.out.sam
## 2875 211064 313845
This will make it easier to access the alignment data later on
onlyCounts = tableCounts[,7:12];
rownames(onlyCounts) = tableCounts$Geneid;
# how many genes do we have at all?
nrow(maxCountPerGene);
## [1] 46078
# how many genes have a count of zero?
nrow(subset(maxCountPerGene, maxCount==0))
## [1] 25804
Now you actually have quite a few plots, so in order to find your way around the HTML page, add a table of content by modifying your header section.
Have a look at how to specify your header section here: http://rmarkdown.rstudio.com/html_document_format.html
Once you added the TOC, go ahead and re-knit your document
Did you like the layout of the tableCounts table? No, then lets do something about it
kable(head(tableCounts, row.names=FALSE));
| Geneid | Chr | Start | End | Strand | Length | SRR3222409_Aligned.out.sam | SRR3222410_Aligned.out.sam | SRR3222411_Aligned.out.sam | SRR3222412_Aligned.out.sam | SRR3222413_Aligned.out.sam | SRR3222414_Aligned.out.sam |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSMUSG00000102693 | 1 | 3073253 | 3074322 | + | 1070 | 0 | 0 | 0 | 0 | 0 | 0 |
| ENSMUSG00000064842 | 1 | 3102016 | 3102125 | + | 110 | 0 | 0 | 0 | 0 | 0 | 0 |
| ENSMUSG00000051951 | 1 | 3205901 | 3671498 | - | 465598 | 41 | 31 | 46 | 90 | 83 | 83 |
| ENSMUSG00000102851 | 1 | 3252757 | 3253236 | + | 480 | 0 | 0 | 0 | 0 | 0 | 0 |
| ENSMUSG00000103377 | 1 | 3365731 | 3368549 | - | 2819 | 0 | 0 | 0 | 0 | 0 | 0 |
| ENSMUSG00000104017 | 1 | 3375556 | 3377788 | - | 2233 | 0 | 0 | 0 | 0 | 0 | 0 |
Lets rename the counts columns to something more useful.
names(tableCounts)[7:12] <- c("KO_1","KO_2", "KO_3", "WT_1", "WT_2", "WT_3");
Lets look at 10 genes with actual KO expression levels…
kable(head(tableCounts[order(tableCounts$KO_1, decreasing=TRUE),], n=10), row.names=FALSE);
| Geneid | Chr | Start | End | Strand | Length | KO_1 | KO_2 | KO_3 | WT_1 | WT_2 | WT_3 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSMUSG00000101111 | 1 | 24613189 | 24613971 | - | 783 | 54423 | 30464 | 44254 | 49345 | 49269 | 58395 |
| ENSMUSG00000100862 | 1 | 24613974 | 24614651 | - | 678 | 54071 | 33384 | 45354 | 52416 | 50684 | 58233 |
| ENSMUSG00000026043 | 1 | 45311538 | 45349706 | + | 38169 | 46209 | 29860 | 42177 | 53756 | 43728 | 65661 |
| ENSMUSG00000056569 | 1 | 171150711 | 171161130 | + | 10420 | 44051 | 58764 | 57719 | 249437 | 211064 | 313845 |
| ENSMUSG00000102070 | 1 | 24614885 | 24615565 | - | 681 | 43041 | 25655 | 35137 | 39980 | 39521 | 43668 |
| ENSMUSG00000101249 | 1 | 24615706 | 24616197 | - | 492 | 41612 | 24487 | 33500 | 38132 | 38003 | 46898 |
| ENSMUSG00000001506 | 11 | 94936224 | 94953042 | + | 16819 | 39374 | 31226 | 37594 | 48452 | 40512 | 45238 |
| ENSMUSG00000029661 | 6 | 4504814 | 4541543 | + | 36730 | 37772 | 26555 | 34362 | 41615 | 36010 | 43936 |
| ENSMUSG00000018593 | 11 | 55394500 | 55420080 | - | 25581 | 35032 | 26919 | 33036 | 45051 | 40312 | 46417 |
| ENSMUSG00000037742 | 9 | 78478449 | 78489151 | - | 10703 | 29426 | 22946 | 27972 | 41506 | 33347 | 33289 |
Now you have both the results (in the form of plots), along with the commands use to generate the plots, and even, in some ways, the data that was the basis for the plots, in the same document. In order to make your results even more re-producable, start adding comments, like figure legends to your plots.
In order to make a volcano plot, you need to read in the differential expression results from the previous day, and then you could technically simply plot it, but in general you are interested in visually highlighting the differentially regulated genes.
Simply use read.table
diffExp <- read.table("results_DE.txt",sep="\t",header=TRUE);
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec =
## dec, : EOF within quoted string
## Warning in scan(file = file, what = what, sep = sep, quote = quote, dec =
## dec, : number of items read is not a multiple of the number of columns
nrow(diffExp);
## [1] 6400
Then on the command line outside RStudio check the number of lines in the file against the above count
wc -l results_DE.txt
The above wont work as there are lots of ’ in the file! this is a general problem with reading files into R, wheras the below will work.
diffExp <- read.table("results_DE.txt", sep="\t", header=TRUE, quote="");
subset(diffExp, mgi_symbol %in% c("Taz","Yap1"))
## ensembl_gene_id chromosome_name mgi_symbol
## 1997 ENSMUSG00000053110 9 Yap1
## 9633 ENSMUSG00000009995 X Taz
## description
## 1997 yes-associated protein 1 [Source:MGI Symbol;Acc:MGI:103262]
## 9633 tafazzin [Source:MGI Symbol;Acc:MGI:109626]
## logFC logCPM LR PValue FDR
## 1997 -0.29413201 6.891383 6.0503383 0.01390363 0.09743683
## 9633 -0.09640493 4.893313 0.3092547 0.57813787 0.83972176
diffExp$threshold = as.factor(diffExp$FDR<=0.01 & abs(diffExp$logFC)>=2)
g <- ggplot(data=diffExp, aes(x=logFC, y=-log10(PValue), colour=threshold)) +
geom_point(alpha=0.4, size=1.75) +
theme(legend.position = "none") +
xlim(c(-10, 10)) + ylim(c(0, 15)) +
xlab("log2 fold change") + ylab("-log10 p-value")
print(g);
## Warning: Removed 82 rows containing missing values (geom_point).
Here we will add mouse symbols on top of the circles for the significantly expressed genes only, as adding it for all 13.000 genes will become unreadable.
genesSignificant <- diffExp[diffExp$threshold==TRUE,]; # first pull out the significant genes
g + geom_text(data=genesSignificant, aes(label= mgi_symbol), size=1.2, colour="black");
## Warning: Removed 82 rows containing missing values (geom_point).
## Warning: Removed 24 rows containing missing values (geom_text).
The above works as genesSignificant looks exactly the same (names of columns) as diffExp, which is because its a subset of the rows in it.
make3wayVenn <- function(A, B, C, categories){
AB=intersect(x=A, y=B);
AC=intersect(x=A, y=C);
BC=intersect(x=B, y=C);
overlap = intersect(x=AB, y=C);
grid.newpage();
v <- draw.triple.venn(area1=length(A),area2=length(B),area3=length(C),
n12=length(AB), n13=length(AC), n23=length(BC),
n123=length(overlap),
category = categories, scaled=FALSE,
print.mode=c('raw','percent'), sigdigs=0,
lty = 'blank',
fill = c('skyblue', 'pink1', 'mediumorchid'));
grid.draw(v);
}
expressedInKO1 <- subset(tableCounts, KO_1>10);
expressedInKO2 <- subset(tableCounts, KO_2>10);
expressedInKO3 <- subset(tableCounts, KO_3>10);
make3wayVenn(expressedInKO1$Geneid, expressedInKO2$Geneid, expressedInKO3$Geneid,
categories=c("KO_1","KO_2","KO_3"));
Other Venn packages
In my opinion Venn diagrams is one of the plots you are asked for again and again that R does not provide a really good answer for. If you Google for venn r you will get a long list of possible packages. I like the one we used here because it gives me the possibility to easily set the layout of it, and to show the percentage as well.
Take the normalised data (here we use log2), decide the annotations you want to add to the plot
pseudoCount = log2(onlyCounts + 1);
annotations <- data.frame(Condition=c("KO","KO", "KO", "WT", "WT", "WT"),
Names=names(pseudoCount));
Actually run the PCA analsysis
pseudoCount.t <- t(pseudoCount); # transpose the data
pcaObject <- prcomp(pseudoCount.t); # perform the PCA analysis
Add the annotations to the pca results, calculate how much each principal component contributes
X <- merge(x = pcaObject$x, y = annotations,
by.x="row.names", by.y = "Names", all.x=TRUE);
percentage <- round(pcaObject$sdev^2 / sum(pcaObject$sdev^2) * 100, 2);
df <- as.data.frame(pcaObject$x);
percentageLabs <- paste( colnames(df), "-", paste( as.character(percentage), "%", sep="") );
Show the results as an annotated plot
ggplot(X, aes_string(x="PC1", y="PC2", color="Condition"))+
geom_point() +
labs(title = "PCA") +
xlab(percentageLabs[1]) + ylab(percentageLabs[2]) +
geom_hline(yintercept=0) + geom_vline(xintercept=0);
showGeneLoadings <- function(pcaObject, N=30, numberOfPCToShow=4){
gl <- as.data.frame(pcaObject$rotation);
gl <- merge(x=gl, y=diffExp[,c("ensembl_gene_id","mgi_symbol")], by.x="row.names", by.y="ensembl_gene_id", all.x=TRUE);
geneLoadings = array(data=0, dim=numberOfPCToShow);
for(g in 1:numberOfPCToShow){
p=paste("PC",g,sep="");
if(p %in% colnames(gl)){
gl.cur <- gl[, c("Row.names", "mgi_symbol", p)];
names(gl.cur) = c("Ensembl", "Symbol", "Loading");
gl.pc <- head(gl.cur[order(abs(gl.cur$Loading), decreasing=TRUE),], n=N); # pick the top N gene loadings
sumInfluence=round(sum(t(abs(gl.pc$Loading))), digits=4);
gl.pc$Symbol <- reorder(gl.pc$Symbol, gl.pc$Loading); # needed in order to make ggplot order the data in the plot
gPlot <- ggplot(gl.pc, aes(x=Loading, y=factor(Symbol)))+
geom_point(color="Blue", aes(order=Loading))+
labs(x="Loading",y="Gene",title=sprintf("Top %i genes in %s", N, p));
print(gPlot);
}
}
}
showGeneLoadings(pcaObject=pcaObject)
What we need:
1) IGV - the actual program
2) genome file (.fa) - “comes with” IGV
3) sorted .bam file of interest(s)
4) index of the sorted .bam file of interest(s), eg the .bai files
Inside IGV:
Genomes
Load genomes from server
Select 'Mouse mm10'
Inside IGV:
File
Load from file
Select one of the .bam files - note that the .bam and the .bam.bai have to be located in the same folder
Do the above for all six .bam files
Inside IGV:
In the textbox (top middle), type Taz and then click on the Go button
https://bioconductor.org/packages/release/bioc/html/Gviz.html
https://bioconductor.org/packages/release/bioc/vignettes/Gviz/inst/doc/Gviz.pdf
Plot CpG islands on chromosome 7 from the human genome hg19 - for a full descriptions of what the commands means, browse through the vignette on your own
data(cpgIslands);
chr <- as.character(unique(seqnames(cpgIslands)))
gen <- genome(cpgIslands);
atrack <- AnnotationTrack(cpgIslands, name = "CpG");
plotTracks(atrack);
gtrack <- GenomeAxisTrack();
plotTracks(list(gtrack, atrack));
itrack <- IdeogramTrack(genome = gen, chromosome = chr);
plotTracks(list(itrack, gtrack, atrack));
data(geneModels);
grtrack <- GeneRegionTrack(geneModels, genome = gen, chromosome = chr, name = "Gene Model")
plotTracks(list(itrack, gtrack, atrack, grtrack));
plotTracks(list(itrack, gtrack, atrack, grtrack), from = 26700000, to = 26750000);
plotTracks(list(itrack, gtrack, atrack, grtrack), extend.left = 0.5, extend.right = 1000000);
strack <- SequenceTrack(Hsapiens, chromosome = chr);
plotTracks(list(itrack, gtrack, atrack, grtrack, strack),
from = 26591822, to = 26591852, cex = 0.8);
grtrack <- GeneRegionTrack(geneModels, genome = gen,
chromosome = chr, name = "Gene Model", transcriptAnnotation = "symbol",
background.title = "brown");
plotTracks(list(itrack, gtrack, atrack, grtrack));
Find the list of all BSgenomes - download the right one
- http://bioconductor.org/packages/release/bioc/html/BSgenome.html
Find the mouse package name
install the package for mouse
Load the .bam file
dTrack4 <- DataTrack(range = "SRR3222411_Aligned.out.bam.sorted.bam", genome = "mm10",
type = "h", name = "Coverage", window = NULL,
options(ucscChromosomeNames=FALSE));
The ucscChromosomeNames=FALSE is important if the chromosome names in the .bam file does not contain the chr part, to get a complete list of chromosome names use samtools from the command line
samtools view SRR3222411_Aligned.out.bam.sorted.bam | cut -f 3 | sort | uniq -c
plotTracks(dTrack4, from = 74280718, to = 74292151, chromosome="X");
Have a look at the settings sections of the Gviz manual (https://bioconductor.org/packages/devel/bioc/manuals/Gviz/man/Gviz.pdf)
gtrack <- GenomeAxisTrack(col="#FF00FF", cex=1.2);
itrack <- IdeogramTrack(genome = gen, chromosome = chr, fontcolor="red");
plotTracks(list(itrack, gtrack, atrack, grtrack));
To see the available display parameters
availableDisplayPars("GenomeAxisTrack");
availableDisplayPars("IdeogramTrack");
More Gviz examples http://www.sthda.com/english/wiki/print.php?id=43
An MA plot is an application of a Bland-Altman plot for visual representation of two channel DNA microarray gene expression data which has been transformed onto the M (log ratios) and A (mean average) scale. (from wikipedia)
For weakly expressed genes, we have no chance of seeing differential expression, because the low read counts suffer from such high Poisson noise that any biological effect is drowned in the uncertainties from the sampling at a low rate. (http://www.bioconductor.org/help/workflows/rnaseqGene/)
An MA-plot is a plot of log-fold change (M-values, i.e. the log of the ratio of level counts for each gene between two samples) against the log-average (A-values, i.e. the average level counts for each gene across the two samples). The MA-plot is a useful to visualize reproducibility between samples of an experiment. From a MA-plot one can see if normalization is needed. In MA plot, genes with similar expression levels in two samples will appear around the horizontal line y = 0 (blue). A lowess t (in red) is plotted underlying a possible trend in the bias related to the mean expression. (http://www.nathalievilla.org/doc/pdf/tutorial-rnaseq.pdf)
makeMAPlot <- function(dat, conditionColumns, caseColumns){
x = rowMeans(as.matrix(dat[,conditionColumns]));
y = rowMeans(as.matrix(dat[,caseColumns]));
M = x - y; ## M-values
A = (x + y)/2; ## A-values
df = data.frame(A, M);
ggplot(df, aes(x = A, y = M)) +
geom_point(size = 1.5, alpha = 1/5) +
geom_hline(color = 'blue3', yintercept=0) +
stat_smooth(se = FALSE, method = 'loess', color = 'red3');
}
makeMAPlot(pseudoCount, 1:3, 4:6);
v <- voom(onlyCounts);
makeMAPlot(v, 1:3, 4:6);
maxCountPerGeneAndLength <- summarise(group_by(tableCounts, Geneid, Length), maxCount=max(KO_1,KO_2, KO_3, WT_1, WT_2, WT_3));
ggplot(maxCountPerGeneAndLength, aes(x=Length, y=maxCount))+
geom_point();
The above is not entirely useful, so log-transform your values
ggplot(maxCountPerGeneAndLength, aes(x=Length, y=log2(maxCount)))+
geom_point();
ggplot(maxCountPerGeneAndLength, aes(x=log2(Length), y=log2(maxCount)))+
geom_point();
In the bottom right plot panel
Export
PDF
pdf(file="volcanoPlot.pdf");
g <- ggplot(data=diffExp, aes(x=logFC, y=-log10(PValue), colour=threshold)) +
geom_point(alpha=0.4, size=1.75) +
theme(legend.position = "none") +
xlim(c(-10, 10)) + ylim(c(0, 15)) +
xlab("log2 fold change") + ylab("-log10 p-value")
print(g);
dev.off() # not a good idea to run from inside your RMarkdown sesssion!
Note that if you run the above code (eg the pdf and dev commands) you will no longer see the plot from inside RStudio!
Note you can easily make a multipage PDF file containing all plots by calling pdf() at the start of your R session and dev.off() at the end…
It is possible to “tell” RMarkdown to generate a “picture” of each plot
knitr::opts_chunk$set(fig.path = 'plots/', dev = 'pdf'); # save all plots in the plots folder so I can access them from powerpoint as well..
The above will create a folder called plots and then make a .pdf file for each.
Of course these type of commands should be part of the ‘setup’ section at the top of your RMarkdown document.
Instead when creating a plot, try naming the code chunk eg to call the code chunk VolcanoPlot simply add it at the top of the current code chunk (after the r add a space followed by the name of the chunk).
ggplot(data=diffExp, aes(x=logFC, y=-log10(PValue), colour=threshold)) +
geom_point(alpha=0.4, size=1.75) +
theme(legend.position = "none") +
xlim(c(-10, 10)) + ylim(c(0, 15)) +
xlab("log2 fold change") + ylab("-log10 p-value")
## Warning: Removed 82 rows containing missing values (geom_point).
now this plot will be called VolcanoPlot-1.pdf in the plots folder, and when you knit your document these names are shown at the bottom of your RStudio